Learning emergent partial differential equations in a learned emergent space

We propose an approach to learn effective evolution equations for large systems of interacting agents. This is demonstrated on two examples, a well-studied system of coupled normal form oscillators and a biologically motivated example of coupled Hodgkin-Huxley-like neurons. For such types of systems there is no obvious space coordinate in which to learn effective evolution laws in the form of partial differential equations. In our approach, we accomplish this by learning embedding coordinates from the time series data of the system using manifold learning as a first step. In these emergent coordinates, we then show how one can learn effective partial differential equations, using neural networks, that do not only reproduce the dynamics of the oscillator ensemble, but also capture the collective bifurcations when system parameters vary. The proposed approach thus integrates the automatic, data-driven extraction of emergent space coordinates parametrizing the agent dynamics, with machine-learning assisted identification of an emergent PDE description of the dynamics in this parametrization.

This work develops a method to learn an effective PDE from data, even in cases where the data does not originate from a PDE. The first step is to learn effective spatial coordinates, which is accomplished via diffusion maps. In this step, the authors treat the time series of each agent (or mesh point) as a point in a high-dimensional space, and apply the diffusion maps method to the collection of time series, thereby grouping similar time series together in the diffusion map coordinates. The diffusion map coordinates thus smoothly parameterize the time series. In the second step, the authors learn a PDE for the data, where the right-hand-side is a function of derivatives with respect to the diffusion map coordinates, and well-documented methods are used to learn the right-hand-side. The authors show that modeling transient solutions in this way results in good short-and longtime predictions.
There are some interesting ideas here. The authors show that they can learn a PDE from data that originates from a PDE, as well as from data that does not originate from a PDE with their unique approach to finding "emergent" spatial variables. On the other hand, the application examples are relatively simple, the present work is largely a logical extension of prior studies from the authors, and the presentation is often unclear. Ultimately, I do not view the work as a substantial enough advance to warrant publication in Nat Comm. Comments: 1. This paper combines previous work to learn effective spatial coordinates (ref. 9) with well-known work to learn the right-hand-side of a system of ODEs, and is essentially a logical extension of ref. 9.
2. Solving a PDE is typically expensive; can the authors give an example where solving a PDE is less expensive than the original system of oscillators? How do the costs compare for the Stuart-Landau example? Moreover, a common way to solve PDEs is to project onto some finite basis. Why not go directly to this finite representation instead of going through a PDE? What are the benefits of going through a PDE?
3. Related to the previous point, the dynamics in the two examples are simply limit cycles. For such simple dynamics, I think it would be better to go straight to a very lowdimensional representation of the dynamics instead of going through a PDE. Does the method work for more complicated dynamics? An example with more complicated dynamics would be beneficial.
4. The authors mention interpretability as a benefit or possible feature of learning a PDE, but I find this to be an unlikely feature when the effective spatial coordinates are learned from data. In the two examples in the paper, the authors were only able to interpret the diffusion map coordinate because they already knew the answer or had a very small set of possible answers.
5. More information is needed about the parameters used in the two examples so that readers could recreate the data if they wanted to. Also, it would be useful to explain why the authors chose the parameters values (and initial and boundary conditions) that they did. Was this simply to get limit cycle behaviour? I also thought that the Methods section was a bit opaque, and think it needs to be expanded and made more explicit.
6. The way boundary conditions are dealt with seems like a rather large weakness to me. This is especially the case for the Stuart-Landau example. From fig. 3d, we can infer that nearly all of the oscillators lie in a range of phi1 values in the boundary corridor illustrated in fig. 3b. The PDE is only learned in the interior region, which ostensibly contains very few oscillators. So, it seems that a PDE is being learned in order to replace very few oscillators, which would be rather wasteful. Is there a way to learn effective boundary conditions instead of the current way of dealing with them? 7. Are the values of the diffusion map coordinate always between -1 and 1, or have they been rescaled and shifted? Of the many potential diffusion map coordinates, how do the authors know which one to pick? How do the authors know how many effective spatial coordinates are needed, and how do they pick the appropriate diffusion map coordinates? Also, what do the authors mean by "independent" (line 231)? Will mixed derivatives show up, and why are they not considered in eq. 7? 8. What advantages do diffusion maps offer for finding the "emergent" spatial variable? It is commented that if it is helpful this can be thought of similarly to keeping the leading component of PCA. Would PCA work? Would other dimension reduction methods work like tSNE, LLE, Isomaps, undercomplete autoencoder, etc.? 9. It would be helpful to more explicitly explain the fixed boundary conditions. It sounds like the boundary conditions are just fixed to some value and this effects the interior points through the derivative calculations. What happens when the boundary conditions are not enforced, do solutions blow up? This constraint seems unnecessary with a good approximation of f.

10
. The examples explored are no more complicated than periodic orbits. Would this method work for chaotic datasets? The authors mention swarms; agent-based models for these are simple to code up. How about looking at this case? 11. Why is the projection onto the leading modes from SVD important? Does this smooth out higher frequency behavior that causes solutions to blow up with the neural network model?
12. The authors show that they are able to learn an effective local PDE for globally coupled oscillators. Although this seems contradictory, certain types of local PDEs do exhibit global solutions (e.g., parabolic PDEs like the heat equation). Something to this effect should be added to the manuscript.
13. How necessary is it to provide derivatives to the neural network? It would be useful to compare the performance when no derivatives are supplied, then the first, and the second. No derivatives is also useful because it shows the effect of learning the "emergent" spatial variables.
14. In addition to showing the distance between the transient and the true attractor (fig 2f,3c,5d) it would be helpful to see the distance between the true transient and the model to judge how well short-time tracking performs. 15. For the two "emergent" spatial variables why are derivatives with respect to both variables not needed?
16. There should be another figure like 5a showing lambda, or it could be made into a 3D figure to show lambda.
17. The description of the grid for the two "emergent" spatial variables is very unclear. Why is this region selected? How can interpolation be done in the bottom right corner where no data points exist? Why are a majority of the omega values dropped?
18. I find the abstract to be too indirect. It would be nice if it were made more clear that a neural network is used for the model and diffusion maps are used to reorganize the data. It seems like the key points ought to be: 1. systems of coupled oscillators have no obvious spatial relation 2. "emergent" spatial variables are discovered via diffusion maps 3. in this emergent space partial differential equations are approximated with neural networks.
19. Some typos: "keepinging" on line 140, "knowing knowing" on line 145, "n_{train}=1" looks like it should be "n_{test}=1" on line 426 Reviewer #2 (Remarks to the Author): The authors present a methodology for learning/describing the evolution of a discrete system through neural networks that use as inputs discrete approximations of PDE kernels. The paper contains several original ideas by the authors (many pioneered and presented by the authors in previous studies) on how to accelerate fine scale simulations by discovering an effective coarse grained PDE. The new "twist", as the authors clearly explain, is the identification of a previously unknown independent variable. I find the presentation of these ideas as a very interesting read.
However right from the start it is clear that they do not learn a PDE but rather a neural net representation of discrete approximations of PDEs. Even more they "help" the NN learn the terms of the PDE that is clearly linked to the respective discrete approximation. So there is limited hope that this methodology may work when such information is not available (as it would not be available on most agent based simulations). What if the "wrong' derivatives had been fed to the NN ? Would the method still work? My estimate is that it would not as after all the NNs can only interpolate and not discover dynamics.
Furthermore, I am disappointed to see that despite the authors claiming important advances to fields ranging from chemistry to quantum mechanics and fluid flows the results involve a 1D example with rather smooth solutions. The authors claim "dramatic" savings in computation "if" this idea is successful, but I do not see any of these claims being supported by the 1D example presented in this paper.
Despite my concerns with the approach I believe that the paper will gain importance (and I hope that it would prove me wrong) by showcasing a problem that is in one of the following categories: 1. 2D or 3D in space and with complex boundary conditions 2. has complex and non-smooth/decaying solutions (see figure 3c on what is the actual and learned) 3. is truly agent based and a discretization of a PDE that is in the end "rediscovered" Perhaps ideas deserve as much attention as proof that they work. Hence I would not object to the publication of this paper if it is decided by the journal.
In summary, I am not convinced that this idea would work to anything beyond smooth solutions duet to to the issues I outline above. If this is the case what is the value of this new approach over existing methodologies (machine learning and/or other coarse graining procedures). At the same time, I will be happy to stand corrected and would be glad to see a a revised version of the paper that addresses my concerns and most importantly that it tackles problems in one of the the 3 categories described above.

Reviewer #3 (Remarks to the Author):
I appreciate what the authors are trying to do here: take the dynamics from a set of coupled agents whose equations of motion are _unknown_, and then use this to derive a data driven PDE to describe the resulting dynamics. For such an approach to be useful, the dynamical laws that are so derived must (i) apply accurately within the domain upon which the model was trained (eg initial conditions), and ideally (ii) also generalize to unseen parameters.
I'm concerned that the choice of model problems (+ parameter ranges) that are chosen in this manuscript are too simplistic, and for that reason the idea doesn't deliver on its promise. The first example where $x$ is the coordinate is especially simplistic --the original PDE already exists in these variables. The second one (Stuart) is getting closer to the complexity where something interesting could be found.
The approach that the authors take is sensible and interesting: they use a data driven method (kernel based PCA) to derive modes from data and then try to use the leading components as a coordinate system to derive the equations. I like this approach very much. There is however a straw man: we could take the modes, which eg in the 2nd Stuart example (Fig 3d) the authors demonstrate a 1-1 mapping between phi_1 and omega_i. Given this mapping could we not directly derive a PDE for W(phi_1,t) by simply plugging it back into equation 4? The coupling term (K/N\sum_....) induces derivatives in the \phi_1 coordinates. The resulting PDE is on the surface close to that of Eqn 5 except that instead of using neural networks it can be derived directly from the equations. It also has the advantage that the parameter variation in this equation is clear --so it is more generally applicable. Presumably this equation will give rise to the birfucations explored in Fig 4. Other comments: --The fact that all the analysis of the paper is close to a bifurcation point is worrisome. Would it make sense of a system where the dynamical behavior was much more complicated--so that perhaps 2 principle components or more are actually needed? For example a simple challenge is to redo the PDE example in eqn 1 for either the KS equation or for Lorenz 96 model, which shows truly chaotic behavior.
--One major issue with the simplicity of the dynamical systems explored here is that it is unclear how much of what is being demonstrated is memorization. Can you demonstrate that the models work on out of distribution test examples? Choosing different random initial conditions of these models in these regimes does not accomplish this--given the simplicity of the dynamics what happens is highly constrained/has low entropy and hence agreement=memorization.
--Agent based models with complex dynamics abound, much more complicated than those outlined here. For example one area the authors might wish to explore are covid -agent based models (eg VaTech, Northeastern,...). The rules for the agents can be quite complicated, and there is an obvious low order dynamical system at the core. Could the present approach actually be useful in that regard--deriving effective dynamics that are predictive for out of distribution examples?
The "dream" behind this work is quite beautiful--but I'm afraid at this point the results could be made much more convincing by choosing better examples.

REVIEWER COMMENTS
Reviewer #1 (Remarks to the Author): Many issues of detail that were brought up in my original review have been addressed. But the main issues I raised in the original review still remain.
Because the authors are not learning boundary conditions, they are using simulation data from near the boundaries as the BCs for the learned PDE. But if the data is time-periodic for example, then they are applying a periodic forcing to their PDE and it's no surprise that their PDE gives a periodic solution. Furthermore, the model has no predictive capability since we need data near the boundary to find the solution. So (as with any problem governed by a PDE), for a complete *predictive* formulation the boundary conditions are necessary too.
In my review of the original submission I asked about examples with dynamics more complex than time-periodic. Although the authors added an example in the revision, it also has time-periodic dynamics. In their response, the authors write that "Using chaotic examples would render the comparison between the learned model and the true dynamics even more complicated, because one would have to check that the predicted trajectories diverge with the right Lyapunov exponent and that, at the same time, the attractors of both systems coincide." The authors are correct, but nevertheless other studies exist in the literature that do make comparisons between chaotic data and data-driven models thereof. That could be done here as well.
So as noted originally, there are interesting ideas and methods here. But at this point the formalism is incomplete and most importantly not predictive, and even so has only been applied to problems with simple temporal dynamics. I don't see it as appropriate for Nat. Comm.
Reviewer #2 (Remarks to the Author): The authors have provided further (interesting) discussion of their methodology and one more application example. I appreciate the extra effort even though the example does not belong into any of the categories that I asked in my first review. Certainly the complexity of the examples can be argued in many different ways. However, I maintain my reservations about the application domains of this methodology. I still find the paper strong in ideas but weak in terms of demonstrating their feasibility to advance the solution of complex problems.
In summary, I recommend publication to Nature Communications on the merit of the ideas presented in the paper. Perhaps applications can follow by these authors or others if this paper gets traction in the community.

Dear Reviewers,
Thank you very much for the feedback on our manuscript "Learning emergent PDEs in a learned emergent space".
We feel we have addressed the point raised by the reviewers by incorporating an example where the dynamics -as requested-is spatio-temporally chaotic (and has periodic boundary conditions, so there is no issue to either learn BC or apply our "narrow corridor" approach). This dynamical state, so called spatio-temporal intermittency, appears in the complex Ginzburg-Landau equation for a suitable set of system parameters. Again, this is not an oscillator or multi-agent example, but we treat the recorded time series obtained from simulations, for illustration purposes, as time series from a discrete ensemble, where the dynamics at each grid point corresponds to a single agent.
As we describe in more detail in the article, we can recover that these agents" can be systematically embedded in a one-dimensional periodic emergent domain, and we can successfully learn the effective PDE in this emergent domain. Note that in doing so, we 1. show that we can capture chaotic dynamics with our PDE learning approach, as it can be observed in the new Figure 2, 2. and that, due to the periodic nature of the domain, we do not have to worry about providing and additional information for boundary conditions (diffusion maps in this case showed us that the domain is periodic, see Figure 2). This means that at least here the dynamics are not slaved to any boundary dynamics.
3. Also confirmed that the discovered domain is one-to-one with the original scramble one, thus providing another small validation step for the approach.
We completely agree that the issue of machine learning boundary conditions (in effect, determining what boundary/initial data make a problem well-posed) is an important direction for future research. We are actively working on it -but respectfully, this is an entire research world by itself. Here we learned the space and the equation; we are also working on determining what initial/boundary conditions make the problem for this emergent space/operator pair well posed. We do mention in the Discussion, older initial work by our group (Ref. 43, the "baby-bathwater" scheme) on designing computational experiments to explore how many initial conditions might be necessary).
Again, we are very grateful for the review received, and the ideas and suggestions provided. We hope with the previous inclusion of the network example, and with the current inclusion of a spatiotemporally chaotic example you will consider the paper worthy of publication.